Skip to content

Conversation

Mahsarnzh
Copy link

@Mahsarnzh Mahsarnzh commented May 6, 2025

Description

Added an optional BatchNorm integration to the NatureCNN architecture used in the feature extractor module of Stable-Baselines3. This enhancement introduces a use_batch_norm flag to toggle Batch Normalization after each convolutional layer. This change provides a performance and stability improvement option for image-based environments.

Motivation and Context

This change will solve the exploding gradients problem and in case it is set to False it does not change anything, however if set to True it will help converge much faster and enables us to use higher learning rates.
Further than that this change allows users to optionally enable Batch Normalization in NatureCNN, which can improve training stability and convergence, especially in environments with high variance in pixel input. I initially explored alternatives (LayerNorm, GroupNorm). BatchNorm showed the best trade-off of speed and stability and convergence.

N/A N/A
  • I have raised an issue to propose this change (#2131 for new features and bug fixes)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have opened an associated PR on the SB3-Contrib repository (if necessary)
  • I have opened an associated PR on the RL-Zoo3 repository (if necessary)
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)
  • I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

Mahsarnzh added 2 commits May 3, 2025 17:17
When enabled via , this stabilizes feature distributions
and reduces internal covariate shift. On Pong, it boosts avg. reward by ~2.3
points at 200k timesteps vs. the default extractor, with all existing tests
still passing.
When enabled via , this stabilizes feature distributions
and reduces internal covariate shift. On Pong, it boosts avg. reward by ~2.3
points at 200k timesteps vs. the default extractor, with all existing tests
still passing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant